White matter hyperintensities (WMHs) are foci of abnormal signal intensity in white matter regions seen with magnetic resonance imaging (MRI). WMHs are associated with normal aging and have shown prognostic value in neurological conditions such as traumatic brain injury (TBI). The impracticality of manually quantifying these lesions limits their clinical utility and motivates the utilization of machine learning techniques for automated segmentation workflows. Herein, we develop a concatenated random forest framework with image features for segmenting WMHs in a TBI cohort. The framework is built upon the Advanced Normalization Tools (ANTs) and ANTsR toolkits. MR (3D FLAIR, T2-, and T1-weighted) images from 24 service members and veterans scanned in the Chronic Effects of Neurotrauma Consortium’s (CENC) observational study were acquired. Manual annotations were employed for both training and evaluation using a leave-one-out strategy. = 43 \(\pm\) 26%. In addition, three lesion size ranges are selected to illustrate the variation in performance with lesion size.} Paired with correlative outcome data, supervised learning methods may allow for identification of imaging features predictive of diagnosis and prognosis in individual TBI patients.
1,2 3–6
2,6–8647.
The random forests framework9 is a popular machine learning technique that has demonstrated significant utility for supervised segmentation tasks (e.g., normal human brain segmentation10) and other computer vision applications (e.g.,11). 12–17
Random forests are conceptually straightforward9. 18,19 20,21. 9
In this work, we develop a concatenated random forest framework with a feature image set (both spatial and intensity-based) for segmenting WMHs in a large TBI cohort. The entire framework is built on the well-known open-source Advanced Normalization Tools (ANTs)1 and ANTsR2 toolkits. Further motivating this research is the availability of several large publicly available imaging data sets that permits testing reproducibility of this automated routine for WMH segmentation and quantification.
MR images utilized for this initial report were acquired from a single scanner involved in the Chronic Effects of Neurotrauma Consortium’s (CENC) observational study (see Walker et al., this issue). Briefly, participants were Operation Iraqi Freedom/Operation Enduring Freedom (OIF/OEF) era Service Members and Veterans between the ages of 18-60 years with prior combat exposure and deployment(s). The feature images 26 subjects aged 39.6 \(\pm\) 8.1 years (range 28–58 years). Within this cohort, 24 (92%) were considered positive for TBI based upon the potential concussive events (PCE) interview process described in detail in Walker et al., this issue).
Images were acquired on a Philips 3.0T Ingenia system with an 8-channel SENSE head coil (Philips Medical Systems, Best, Netherlands). 3D FLAIR sequences were acquired with a turbo spin echo inversion recovery sequence with the following parameters: repetition time (TR) = 4800 ms, echo time (TE) = 325 ms, inversion time (TI) = 1650 ms; 170 sagittal slices with a 1.2 mm slice thickness, 256 \(\times\) 256 acquisition matrix, and 256 \(\times\) 256 mm FOV. 3D T1-weighted sequences were acquired with a fast field echo (FFE) sequence with the following parameters: TR = 6.8 ms, TE = 3.2 ms, echo train length (ETL) = 240; Flip angle = 9\(^\circ\), 170 sagittal slices with a 1.2 mm slice thickness, 256x240 acquisition matrix, and 256 \(\times\) 256 mm FOV. In addition, 3D T2-weighted images were acquired with a turbo spin echo sequence with the following parameters: TR = 2500 ms, TE = 245 ms, ET: = 133; 170 sagittal slices with a 1.2 mm slice thickness, 256 \(\times\) 256 acquisition matrix, and 256 x 256 mm FOV.
.3 22 23
24 17, 2526
For the targeted application in this work, tissue classification is performed at the voxelwise level. In other words, each voxel within the region of interest is sent through the ensemble of decision trees and receives a set of classification votes from each tree thus permitting a regression or classification solution. Since this procedure is performed at the voxelwise level, intensity information alone is insufficient for good segmentation performance due to the lack of spatial context. For example, as pointed out in27, higher intensities can be found at the periventricular caps in normal subjects which often confounds automated lesion detection algorithms. Other potential confounds include MR signal inhomogeneity and noise. Therefore, even though machine learning and pattern recognition techniques are extremely powerful and have significant potential, just as crucial to outcome is the creative construction and deployment of salient feature images which we detail below.
Supervised methodologies are uniquely characterized, in part, by the feature images that are used to identify the regions of interest. In Table 2, we provide a list and basic categorization of the feature images used for the initial (i.e., Stage 1—more on the use of multiple random forest stages below) segmentation of the WMHs. In addition Figure 3 provides a representation of a set of feature images for a single subject analyzed in this work. Note that in this work we categorize the brain parenchyma with seven labels:
As mentioned previously, input for each subject comprises FLAIR, T1-, and T2-weighted acquisitions. The T1 and T2 images are rigidly registered to the FLAIR image using the open-source Advanced Normalization Tools (ANTs)22. The aligned images are then preprocessed using the denoising algorithm of29 followed by N4 bias correction28 which are then normalized to the intensity range \([0,1]\). Although we could have used an alternative intensity standardization algorithm (e.g.,30), we found that a simple linear rescaling produced better results similar to previous work17.
The T1 image is then processed via the ANTs brain extraction and normal tissue segmentation pipelines24. The result is a mask delineating the brain parenchyma and probabilistic estimates of the CSF, gray matter, white matter, deep gray matter, brain stem, and cerebellum31. These provide the expertly annotated labels for the first six tissue labels given above. Tissue prior probability maps for segmentation are from multi-model optimal symmetric shape/intensity templates17 created from the public MMRR data set25 (cf Figure 2).
Feature values include the preprocessed FLAIR, T1, and T2 image voxel intensities. We also calculate a set of neighborhood statistics (mean, standard deviation, and skewness) feature images using a Manhattan radius of one voxel given the typical size of individual WMHs. For each of the preprocessed images, we calculate the difference in intensities with the corresponding warped template component. Previous success in the international brain tumor segmentation competition32 was based on an important set of intensity features that were created from multi-modal templates mentioned previously17 and listed in Table 2. We employ the same strategy here.
To take advantage of the gross bilateral symmetry of the normal brain (in terms of both shape and intensity), and the fact that WMHs do not generally manifest symmetrically across hemispheres, we use the symmetric templates to compute the contralateral intensity differences as an additional intensity feature.
The segmentation probability images described above are used as feature images to provide a spatial context for the random forest model prediction step. Additional spatial contextual feature images include the distance maps33 based on the csf, gray matter, and deep gray matter images. These latter images are intended to help distinguish white matter hyperintensities from false positives induced by the partial voluming at the gray/white matter interface. A third set of images are based on the voxel location within the space of the template. 34 35 36.
In previous brain tumor segmentation work17, it was demonstrated that a concatenated supervised approach, whereby the prediction output from the first random forest model serves as partial input for a second random forest model, can significantly improve segmentation performance. We do the same thing for the work described here where we employ two stacked random forests (or two “stages”). The Stage 1 feature images of the training data (as described previously) are used to construct the Stage 1 model. The training data Stage 1 features are then used to produce the voxelwise “voting maps” (i.e., the classification count of each decision tree for each tissue label) via the Stage 1 random forest model. All the Stage 1 features plus the Stage 1 voting maps are used as input to the Stage 2 model. In addition, we use the Stage 1 voting maps as tissue priors (i.e., probabilistic estimates of the tissue spatial locations) for a second application of the \(6\)-tissue segmentation algorithm with an additional Markov Random Field spatial prior (MAP-MRF)31. The resulting seven posterior probability images constitute a third additional feature image set for Stage 2.
As pointed out in a recent comprehensive lesion segmentation review37, although the number of algorithms reported in the literature is quite extensive, there were only four publicly available segmentation algorithms at the time of writing this article. In contrast to the current work, none are based on supervised learning. As we did for our brain tumor segmentation algorithm17, all of the code described in this work is publicly available through the open-source ANTs/ANTsR toolkits. Through ANTsR (an add-on toolkit which, in part, bridges ANTs and the R statistical project) we use the randomForest package38 using the default settings with 2000 trees per model and 500 randomly selected samples per label per image. Note that we saw little variation in performance when these parameters were changed (i.e. up to 1000 random samples and as little as 1000 trees) which is consistent with our previous experience.
In addition, similar to our previous offering,4 we plan on creating a self-encapsulated example to showcase the proposed methodology 5 The fact that the data will also be made available through the Federal Interagency Traumatic Brain Injury Research (FITBIR) repository along with the manual labelings will facilitate reproducibility on the part of the reader as well as any interest in extending the proposed framework to other data sets.
In order to evaluate the protocol described, we performed a leave-one-out evaluation using the data acquired from the 24 subjects described above. Initial processing included the creation of all Stage 1 feature images for all subjects. The initial brain segmentation of each T1 image and the manual white matter hyperintensity tracings were combined to provide the truth labels for the training data. The ``truth’’ labels are the seven anatomical regions given above.
The leave-one-out procedure is as follows:
37
\[ sensitivity = \frac{ TP }{TP + FN}, \]
\[ PPV = \frac{ TP }{TP + FP}, \]
\[ F_1 = \frac{ 2 \cdot TP }{ 2 \cdot TP + FP + FN}. \]
\[ Relative\,\,volume\,\,difference = \frac{V_{manual} - V_{predicted}}{V_{manual}}. \]
.
= 0.70 \(\pm\) 0.34
= 0.42 \(\pm\) 0.36
= 0.47 \(\pm\) 0.36
= 43 \(\pm\) 38%
= 0.68 \(\pm\) 0.38
= 0.51 \(\pm\) 0.40
= 0.52 \(\pm\) 0.36
= 43 \(\pm\) 26%
After performing the leave-one-out evaluation, we calculated the MeanDecreaseAccuracy feature values for each of the 24 subjects \(\times\) 2 models per subject \(=48\) total models. This measure (per feature, per model) is calculated during the out-of-bag phase of the random forest model construction and quantifies the decrease in prediction accuracy from omitting the specified feature. In other words, this quantity helps determine the importance of a particular feature and, although we save such efforts for future work, this information provides us with guidance for future feature pruning and/or additions.
The resulting rankings for both Stages are given in Figures 6 and 7 where the values for the separate stages are averaged over the entire corresponding model set. In addition, we track the variance for each feature over all models to illustrate the stability of the chosen features during the evaluation. This latter information is illustrated as horizontal errors bars providing the \(95^{th}\) percentile Note that the reader can cross reference Table 1 for identifying corresponding feature types and names.
39,40 41.
Regarding the feature rankings, it is interesting to note some of the other top performing features for Stage 1. The contralateral difference FLAIR image is highly discriminative over the set of evaluation random forest models (see Figure 8). This accords with the known clinical relevance of FLAIR images for identifying white matter hyperintensities and the fact that such pathology does not typically manifest symmetrically in both hemispheres. Interestingly, the posterior maps for the deep gray matter are extremely important for accurate white matter hyperintensity segmentation. Perhaps the spatial specification of deep gray matter aids in the removal of false positives. Inspection of the bottom of the plots demonstrates the lack of discriminating features associated with the T1 image which is also well-known in the clinical literature.
As described earlier, for Stage 2, we used the output random forest voting maps from Stage 1 as both features themselves and as priors for input to a Bayesian-based segmentation with an additional MRF spatial prior. In Figure 7, the voting maps are labeled as “RFStage1VotingMaps” where the final numeral is associated with the brain parenchymal labeling given previously. Similarly, the additional RF prior segmentation feature probability maps are labeled as “RFBrainSegmentationPosteriors”. The Stage 2 feature importance plot follows similar trends as that for Stage 1 with the T1 images not contributing much to the identification of white matter hyperintensity voxels. The initial voting maps from Stage 1 are extremely important with the top 3 being the estimated locations of the 1) gray matter, 2) white matter, and 3) white matter hyperintensities. Since these tissue type can be conflated based on intensity alone it is intuitive that such features would be important.
The current communications describes a supervised statistical learning methodology for identifying WHMs within multimodal MR brain imaging. This effort utilized information acquired from the manual segmentation of WMHs from FLAIR images to help build two-stage ensembles of decision trees for the automated identification of these lesions. Although only a single expert was used to produce the manual labelings, our intent is to further refine the proposed paradigm by crowdsourcing with feedback from other experts who interact with both the data and methodology. Also, we recognize that only a single site was used for evaluating the proposed framework. However, we are currently processing other site data with the models developed for this work and the results look promising since the developed features are site-agnostic.
As far as we know, this is the first report utilizing a novel random forest approach to identify WMHs in a cohort of TBI patients. TBI WMHs tend to be more difficult to segment than MS lesions as the former tend to be smaller with an overall smaller lesion load. Also, enhancement protocols with the former tend to be less successful than with the latter. As mentioned previously, the work in MS lesion segmentation is extensive with a handful of techniques being publicly available.
Two major meta-analyses of WMHs have been published covering the periods prior to39 and after 201040. Debette & Markus39 found that the presence of WMHs was related to subsequent cognitive decline, a higher risk of developing dementia, stroke, and of mortality. Lesion volume at baseline was also predictive of cognitive decline. Kloppenborg et al.40 of 23 cross-sectional studies reporting MRI and concurrent neuropsychological results in patients with heterogeneous diagnoses but without previously diagnosed cognitive impairment, found that WMHs were associated with cognitive deficit (effect size of -0.10, 95% CI: -0.13 to -0.08) after controlling for age.
Despite the potential clinical significance of WMHs these lesions receive little attention in current clinical workflows. When reported in a standard neuroradiologist interpretation, they are typically handled as incidental findings and are assigned little clinical significance. This likely reflects the impracticality of performing a detailed assessment of number, volume, and distribution within a qualitative neuroradiologist interpretation as well as the lack of correlative information on how the presence and distribution of these lesions may inform a diagnosis and prognosis in the appropriate clinical setting. To date, automated or semi-automated tools for the detection of WMHs have lacked the specificity and efficiency for the mining of large-scale datasets to generate highly granular data on whether these lesions possess any true diagnostic or prognostic value in the setting of a specific disease process. The present communication describes a supervised statistical learning tool that is appropriate for the application to such large-scale datasets.
The authors wish to acknowledge all other members of the CENC Neuroimaging Steering Committee and CENC leadership (Drs. David X. Cifu, Ramon Diaz-Arrastia, and Rick Williams) for their support. We also gratefully acknowledge the assistance of Tracy Nolen, Chris Siege and Kevin Wilson. We would also like to thank the study participants and their family members. This project was jointly supported by the Department of Defense (W81XWH-13-2-0095), the U.S. Department of Veterans Affairs (I01 CX001135 and I01 RX 002174), as well as USUHS Grant HU 0001-08-0001.
The authors report no financial disclosures or conflicts of interest. The views expressed here are those of the authors and do not necessarily reflect the official policy of position of the Department of the Navy, Department of Defense, nor the U.S. Government. This work was prepared as a part of official duties; Title 17 USC §105 provides that Copyright protection under this title is not available for any work of the U.S. Government. Title 17 USC §101 defines a US Government work as a work prepared by a military service member of employee of the US Government as part of that person’s official duties.
1. Bigler ED, Abildskov TJ, Petrie J, Farrer TJ, Dennis M, Simic N, Taylor HG, Rubin KH, Vannatta K, Gerhardt CA, et al. Heterogeneity of brain lesions in pediatric traumatic brain injury. Neuropsychology. 2013;27(4):438–51.
2. Smitherman E, Hernandez A, Stavinoha PL, Huang R, Kernie SG, Diaz-Arrastia R, Miles DK. Predicting outcome after pediatric traumatic brain injury by early magnetic resonance imaging lesion location and volume. J Neurotrauma. 2016;33(1):35–48.
3. Marquez de la Plata C, Ardelean A, Koovakkattu D, Srinivasan P, Miller A, Phuong V, Harper C, Moore C, Whittemore A, Madden C, et al. Magnetic resonance imaging of diffuse axonal injury: Quantitative assessment of white matter lesion volume. J Neurotrauma. 2007;24(4):591–8.
4. Moen KG, Brezova V, Skandsen T, Håberg AK, Folvik M, Vik A. Traumatic axonal injury: The prognostic value of lesion load in corpus callosum, brain stem, and thalamus in different magnetic resonance imaging sequences. J Neurotrauma. 2014;31(17):1486–96.
5. Ding K, Marquez de la Plata C, Wang JY, Mumphrey M, Moore C, Harper C, Madden CJ, McColl R, Whittemore A, Devous MD, et al. Cerebral atrophy after traumatic white matter injury: Correlation with acute neuroimaging and outcome. J Neurotrauma. 2008;25(12):1433–40.
6. Pierallini A, Pantano P, Fantozzi LM, Bonamini M, Vichi R, Zylberman R, Pisarri F, Colonnese C, Bozzao L. Correlation between mRI findings and long-term outcome in patients with severe brain trauma. Neuroradiology. 2000;42(12):860–7.
7. Weiss N, Galanaud D, Carpentier A, Tezenas de Montcel S, Naccache L, Coriat P, Puybasset L. A combined clinical and mRI approach for outcome assessment of traumatic head injured comatose patients. J Neurol. 2008;255(2):217–23.
8. Levin HS, Williams D, Crofford MJ, High WM Jr, Eisenberg HM, Amparo EG, Guinto FC Jr, Kalisky Z, Handel SF, Goldman AM. Relationship of depth of brain lesions to consciousness and outcome after closed head injury. J Neurosurg. 1988;69(6):861–6.
9. Breiman L. Random forests. In: Machine learning. 2001. pp. 5–32.
10. Yi Z, Criminisi A, Shotton J, Blake A. Discriminative, semantic segmentation of brain tissue in MR images. Med Image Comput Comput Assist Interv. 2009;12(Pt 2):558–65.
11. Viola P, Jones M, Snow D. Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision. 2005;63:153–161.
12. Geremia E, Clatz O, Menze BH, Konukoglu E, Criminisi A, Ayache N. Spatial decision forests for MS lesion segmentation in multi-channel magnetic resonance images. Neuroimage. 2011;57(2):378–90.
13. Pustina D, Coslett HB, Turkeltaub PE, Tustison N, Schwartz MF, Avants B. Automated segmentation of chronic stroke lesions using lINDA: Lesion identification with neighborhood data analysis. Hum Brain Mapp. 2016 Jan.
14. Geremia E, Menze BH, Ayache N. Spatial decision forests for glioma segmentation in multi-channel MR images. In: Proceedings of MICCAI-BRATS 2012. 2012.
15. Bauer S, Fejes T, Slotboom J, Wiest R, Nolte L-P, Reyes M. Segmentation of brain tumor images based on integrated hierarchical classification and regularization. In: Proceedings of MICCAI-BRATS 2012. 2012. pp. 10–13.
16. Zikic D, Glocker B, Konukoglu E, Shotton J, Criminisi A, Ye DH, Demiralp C, Thomas OM, Das T, Jena R, et al. Context-sensitive classification forests for segmentation of brain tumor tissues. In: Proceedings of MICCAI-BRATS 2012. 2012. pp. 1–9.
17. Tustison NJ, Shrinidhi KL, Wintermark M, Durst CR, Kandel BM, Gee JC, Grossman MC, Avants BB. Optimal symmetric multimodal templates and concatenated random forests for supervised brain tumor segmentation (simplified) with aNTsR. Neuroinformatics. 2015;13(2):209–25.
18. Schapire R. The strength of weak learnability. Machine Learning. 1990;5:197–227.
19. Freund Y, Schapire R. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences. 1997;55:119–139.
20. Ho TK. Random decision forests. In: Document analysis and recognition, 1995., proceedings of the third international conference on. Vol. 1. 1995. pp. 278–282 vol.1.
21. Amit Y, Geman D. Shape quantization and recognition with randomized trees. Neural Computation. 1997;9:1545–1588.
22. Avants BB, Tustison NJ, Stauffer M, Song G, Wu B, Gee JC. The Insight ToolKit image registration framework. Front Neuroinform. 2014;8:44.
23. Yushkevich PA, Piven J, Hazlett HC, Smith RG, Ho S, Gee JC, Gerig G. User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. Neuroimage. 2006;31(3):1116–28.
24. Tustison NJ, Cook PA, Klein A, Song G, Das SR, Duda JT, Kandel BM, Strien N van, Stone JR, Gee JC, et al. Large-scale evaluation of aNTs and freeSurfer cortical thickness measurements. Neuroimage. 2014;99:166–79.
25. Landman BA, Huang AJ, Gifford A, Vikram DS, Lim IAL, Farrell JAD, Bogovic JA, Hua J, Chen M, Jarso S, et al. Multi-parametric neuroimaging reproducibility: A 3-T resource study. Neuroimage. 2011;54(4):2854–66.
26. Avants BB, Yushkevich P, Pluta J, Minkoff D, Korczykowski M, Detre J, Gee JC. The optimal template effect in hippocampus studies of diseased populations. Neuroimage. 2010;49(3):2457–66.
27. Neema M, Guss ZD, Stankiewicz JM, Arora A, Healy BC, Bakshi R. Normal findings on brain fluid-attenuated inversion recovery mR images at 3T. AJNR Am J Neuroradiol. 2009;30(5):911–6.
28. Tustison NJ, Avants BB, Cook PA, Zheng Y, Egan A, Yushkevich PA, Gee JC. N4ITK: Improved N3 bias correction. IEEE Trans Med Imaging. 2010;29(6):1310–20.
29. Manjón JV, Coupé P, Martí-Bonmatí L, Collins DL, Robles M. Adaptive non-local means denoising of mR images with spatially varying noise levels. J Magn Reson Imaging. 2010;31(1):192–203.
30. Nyúl LG, Udupa JK, Zhang X. New variants of a method of MRI scale standardization. IEEE Trans Med Imaging. 2000;19(2):143–50.
31. Avants BB, Tustison NJ, Wu J, Cook PA, Gee JC. An open source multivariate framework for \(n\)-tissue segmentation with evaluation on public data. Neuroinformatics. 2011;9(4):381–400.
32. Menze BH, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, Burren Y, Porz N, Slotboom J, Wiest R, et al. The multimodal brain tumor image segmentation benchmark (bRATS). IEEE Trans Med Imaging. 2015;34(10):1993–2024.
33. Maurer CR, Rensheng Q, Raghavan V. A linear time algorithm for computing exact Euclidean distance transforms of binary images in arbitrary dimensions. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 2003;25(2):265–270.
34. Anbeek P, Vincken KL, Osch MJP van, Bisschops RHC, Grond J van der. Probabilistic segmentation of white matter lesions in mR imaging. Neuroimage. 2004;21(3):1037–44.
35. Tustison NJ, Avants BB. Explicit B-spline regularization in diffeomorphic image registration. Front Neuroinform. 2013;7:39.
36. Avants BB, Tustison NJ, Song G, Cook PA, Klein A, Gee JC. A reproducible evaluation of ANTs similarity metric performance in brain image registration. Neuroimage. 2011;54(3):2033–44.
37. García-Lorenzo D, Francis S, Narayanan S, Arnold DL, Collins DL. Review of automatic segmentation methods of multiple sclerosis white matter lesions on conventional magnetic resonance imaging. Med Image Anal. 2013;17(1):1–18.
38. Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2/3:18–22.
39. Debette S, Markus HS. The clinical importance of white matter hyperintensities on brain magnetic resonance imaging: Systematic review and meta-analysis. BMJ. 2010;341:c3666.
40. Kloppenborg RP, Nederkoorn PJ, Geerlings MI, Berg E van den. Presence and progression of white matter hyperintensities and cognition: A meta-analysis. Neurology. 2014;82(23):2127–38.
41. Ginneken BV, Heimann T, Styner M. 3D segmentation in the clinic: A grand challenge. In: 3D segmentation in the clinic: A grand challenge. 2007. pp. 7–15.